The Value of Physiological Scoring Criteria in Predicting the In-Hospital Mortality of Acute Patients; a Systematic Review and Meta-Analysis

Introduction: There is no comprehensive meta-analysis on the value of physiological scoring systems in predicting the mortality of critically ill patients. Therefore, the present study intended to conduct a systematic review and meta-analysis to collect the available clinical evidence on the value of physiological scoring systems in predicting the in-hospital mortality of acute patients. Method: An extensive search was performed on Medline, Embase, Scopus, and Web of Science databases until the end of year 2020. Physiological models included Rapid Acute Physiology Score (RAPS), Rapid Emergency Medicine Score (REMS), modified REMS (mREMS), and Worthing Physiological Score (WPS). Finally, the data were summarized and the findings were presented as summary receiver operating characteristics (SROC), sensitivity, specificity and diagnostic odds ratio (DOR). Results: Data from 25 articles were included. The overall analysis showed that the area under the SROC curve of REMS, RAPS, mREMS, and WPS criteria were 0.83 (95% CI: 0.79-0.86), 0.89 (95% CI: 0.86-0.92), 0.64 (95% CI: 0.60-0.68) and 0.86 (95% CI: 0.83-0.89), respectively. DOR for REMS, RAPS, mREMS and WPS models were 11 (95% CI: 8-16), 13 (95% CI: 4-41), 2 (95% CI: 2-4) and 17 (95% CI: 5-59) respectively. When analyses were limited to trauma patients, the DOR of the REMS and RAPS models were 112 and 431, respectively. Due to the lack of sufficient studies, it was not possible to limit the analyses for mREMS and WPS. Conclusion: The findings of the present study showed that three models of RAPS, REMS and WPS have a high predictive value for in-hospital mortality. In addition, the value of these models in trauma patients is much higher than other patient settings.


Introduction
Trauma is one of the most important causes of mortality and disability in societies, especially in developing countries (1). Statistics show that trauma and accidents is the third leading cause of death in the entire population of Iran and is unfortunately the leading cause of death among young people (2). Since the young population constitutes the majority of casualties, the burden of trauma and accidents is far greater than many infectious and non-communicable diseases. The extent of the problem is such that, according to the World Health Organization, up to 50% of people who are hospitalized due to unintentional accidents are discharged with some form of disability (3). Studies show that if the severity of trauma and injury is diagnosed quickly, the mortality rate and the resulting disability will be significantly reduced (4). For this purpose, diagnostic modalities such as CT scan, magnetic resonance imaging, ultrasound, and chest x-ray are used to identify the severity of injury in the clinic. However, for reasons such as the lack of proper access to this equipment in many cases, the risks of exposure to radiation, as well as the limitations of these diagnostic tests (for instance the low diagnostic value of chest X-ray in identifying pneumothorax, the value of ultrasound depending on the skill of the operator and etc.) Researchers have long sought other ways to classify patients. One of these methods is the use of scoring models based on clinical examinations. These models, known as scoring systems, have been in research for decades and have been gradually modified. However, the use of these models has always been associated with disadvantages and limitations (5). For example, the calculation of many of the introduced models and their scoring methods are complex and, in some cases, their validity has not been examined in different clinical conditions. Therefore, research in this field is still ongoing and a number of new models have been presented lately. In recent years, health departments have proposed the establishment of physiological scoring systems to identify patients at high risk for mortality so that the management of these trauma patients can be more well-structured, thereby reducing the burden of trauma (6). Based on this, several physiological scoring systems were developed and provided to researchers, such as early-warning scoring system (7)(8)(9)(10), Worthing Physiological Scoring System (WPSS) (11) Rapid Emergency Medicine Score (REMS), Acute Physiology and Chronic Health Evaluation (APACHE) II, and Revised Trauma Score. Nevertheless, there is still no definitive conclusion as to whether the use of physiological scoring systems can reliably predict the outcome of trauma patients. One of the ways to answer such a question is to conduct a systematic review and meta-analysis on the matter. In this regard, a metaanalysis conducted on poisoning patients from 29 studies in 2017 showed that the APACHE II score in deceased patients was significantly lower than in living patients. The best cutoff point for APACHE II was 10, at which the cut-off had a sensitivity of 88% and a specificity of 84% (12). Another metaanalysis by Hamilton et al. In 2018 examined the diagnostic value of early warning scoring system in septic patients. In this analysis, which was performed on 6 studies, it was found that this scoring system cannot accurately predict the mor-tality of patients with sepsis (13). However, research in this field is still open and systematic reviews are being conducted (14,15). Although many studies have been performed in this field in recent years, a comprehensive meta-analysis has not yet been performed on other physiological scoring systems (16)(17)(18)(19). Based on this, the researchers of the present study intended to conduct a systematic review and meta-analysis to collect the available clinical evidence on the value of physiological scoring criteria in predicting the in-hospital mortality of patients. The studied physiological criteria included Rapid Acute Physiology Score (RAPS), Rapid Emergency Medicine Score (REMS), modified REMS (mREMS), and Worthing Physiological Score (WPS). Although the initial objective of the present study focused only on trauma patients, in the end, in addition to studies performed on trauma patients, other causes of acute hospitalization including infection and sepsis were also included.

Study design
The aim of this study was to evaluate the value of physiological scoring models in predicting the in-hospital outcomes of acute hospitalized patients. In the present study, the MOOSE guideline was used, which is a guide for performing systematic review and meta-analysis in observational research (20).

Definition of PICO
The problem or population studied (P) includes human studies performed on acute hospitalized patients. Index (I) is the physiological scoring models including RAPS, REMS, mREMS and WPS. Comparisons (C) are made with the living group and the assessed outcome (O) is patients' mortality.

Search strategy
To achieve the objectives of the present study, an extensive search was conducted in electronic databases and related article sources. Grey literature search was another strategy that was undertaken in the present project. The search of electronic databases was carried out systematically under the supervision of an expert and researcher in the field of systematic review. At this stage, related keywords were selected using MeSH and Emtree databases, consultation with experts and search in the title and abstract of related articles. The search strategy for each database was then defined using the site's guideline of search strategy. The approach on how to search and summarize the data has been reported in the previous meta-analyses of the present study's researchers (21)(22)(23)(24)(25)(26)(27)(28)(29)(30)

Selection criteria
Human diagnostic studies performed to assess the value of physiological scoring models and their predictive power regarding patients' outcomes were included. The study population weres human studies with no age, sex, or racial restrictions. Case report studies, case series, review articles, failure to evaluate index test compared to the standard reference, and not following up patients until their discharge from the hospital were our exclusion criteria.

Data extraction
Screening and summarizing of articles, and entering their data into the checklist, as well as the final quality control were executed by two independent researchers. Any disagreement was resolved through discussion with a third researcher. Articles were summarized based on a checklist designed according to the PRISMA statement guidelines (31). The extracted data included information related to the study design, sample characteristics (age, sex, mechanism of injury), number of samples examined, outcome, and possible biases (Bias). If two or more articles were based on the same dataset, the study with the largest sample size or the longest follow-up time was included. If the required data was not provided in the article, the data was requested by contacting the corresponding author. If data were recorded separately for different subgroups (such as sex or age, etc.), they were entered in our study in the same way.

Risk of bias of articles
The quality was assessed using QUADAS-2 instructions (32). To evaluate the agreement between the two researchers, inter-rater reliability was examined in the qualitative evaluation of the studies. In case of disagreement, the dispute was resolved through discussion with a third researcher.

Statistical analyses
Analyses were performed using STATA 14.0 statistical program. All studies were summarized and categorized based on patients' outcomes (dead or alive) and true positives, true negatives, false positives, and false negatives were recorded accordingly. In the above-mentioned statistical program, analyses were performed using the "midas" command. Based on different sub-commands, the area under the curve (SROC) of each of the scoring models, their sensitivity, specificity, and diagnostic odds ratio (DOR) with 95% confidence interval (95% CI) were calculated. Based on the pres-ence or absence of heterogeneity, a random effect model or a fixed effect model was used to perform the analyses, respectively. I2 test was used to evaluate the heterogeneity between studies. In cases of heterogeneity, meta-regression and subgroup analysis were performed to determine the cause of heterogeneity. Finally, the results of the studies were pooled and an overall effect size was presented. Deek's Funnel Plot was used to identify publication bias (33).

Characteristics of the included studies
Our search yielded 158 non-duplicate articles. Of these, 77 potentially eligible articles were studied in more detail and finally, 25 articles were included in the present meta-analysis (66-42) ( Figure 1). There were 11 prospective cohort studies, 11 retrospective cohort studies, 1 case-control study, and 2 cross-sectional studies. These studies included 737,351 patients (47.16% male) and of all patients, 23,149 (3.14%) died. There were 6 studies on trauma patients, 9 studies on sepsis / infection patients, 5 studies on all acute conditions (mixed population) and 5 studies on non-trauma patients. Table 1 shows the characteristics of the included studies.

Meta-analysis
The diagnostic value of REMS in predicting the in-hospital mortality 21 articles were included in the evaluation of the diagnostic value of REMS in predicting the in-hospital mortality of patients. These 21 articles contained 38 separate analyses in terms of different cut-off points. The total number of patients in these 21 studies was 578,373, of whom 10,862 died. The cut-off points for this model varied between 3 and 11. Overall analysis showed that the area under the ROC curve, regardless of the cut-off points, was 0.83 (95% CI: 0.79 to 0.86). Overall sensitivity, specificity, and DOR of REMS model in predicting the in-hospital mortality were 0.83 (95% CI 0.75 to 0.88), 0.71 (95% CI: 0.63 to 0.77), and 11 (95% CI: 8 to 16), respectively ( Figure 2-A). Nevertheless, there was significant heterogeneity among studies (I2 = 100.0%); therefore, metaregression was performed. Meta-regression showed that the most important sources of heterogeneity between studies were using different cut-off points, the difference in study design (retrospective and prospective), and different settings of patients ( Table 2). Stratification of analysis based on these differences between studies caused a significant reduction in heterogeneity to the point that I2 was zero in some subgroups. Accordingly, the findings were reported separately for these subgroups. The first and the most important factor influencing the prognostic value of REMS in predicting the in-hospital mortality was the different cut-off points between studies. Based on meta-regression, the cut-off points were divided into three groups: REMS scores≤5 (categories with sensitivity higher than 90%), REMS scores between 6 to 8 (categories with sensitivity between than 70% to 89%) and REMS scores≥8 (categories with sensitivity lower than 70%). The area under the ROC curve of the REMS model at the cut-off scores≤5, 5 to 8, and ≥8 was 0.87 (95% CI: 0.84 to 0.90), 0.83 (95% CI: 0.79 to 0.86), and 0.80 (0.76 to. 0.83), respectively ( Figure 2-B to 2-D). In the evaluation of DOR between subgroups, it was found that the classification of patients based on REMS≤5 cut-off point had more clinical value than other cut-offs; since in this cut-off, DOR of REMS was 27 in predicting the in-hospital mortality, which was way more than cut-offs between 6 to 8 (DOR = 9) and ≥9 (DOR = 7) ( Table 3). In evaluating the role of difference in the type of study, it was found that 36 analyses had cohort design, while 1 study had a case-control design, and 1 had a cross-sectional design. Therefore, subgroup analysis was not useful for this factor. Another point obtained in subgroup analysis was the role of study design (retrospective versus prospective) on the predictive value of REMS. As Table 3 shows, the DOR of REMS was 15 in prospective studies and 9 in retrospective studies. The setting of patients in the included studies was another factor influencing the findings on the predictive value of REMS. In this section, 4 analyses were performed on trauma patients, 9 analyses were performed on patients with sepsis / infection, 4 analyses were performed on non-trauma acute surgery, and 21 analyses were performed on all acute conditions. The interesting point was the very high prognostic value of REMS in trauma patients. DOR of REMS was 112 in predicting the in-hospital mortality of trauma patients, while in other patient settings the DOR value was much lower (DOR = 9 in sepsis / infection, DOR = 20 in non-trauma setting, and DOR = 8 in all acute settings) ( Table 3).

The diagnostic value of RAPS in predicting the in-hospital mortality
In the evaluation of the diagnostic value of RAPS in predicting the in-hospital mortality, 8 articles were included, which included 12 separate analyses in terms of different cut-off points. The total number of patients in these 8 studies was 55052 patients, of which 710 patients died. The cut-off points presented for this model in the studies varied between 2 and 8. The area under the ROC curve of RAPS in predicting the inhospital mortality without considering the cut-off points was 0.89 (95% CI: 0.86 to 0.92) (Figure 3-A). The sensitivity, specificity, and overall DOR of RAPS in predicting the in-hospital mortality were 0.82 (95% CI 0.63 to 0.92), 0.83 (95% CI: 0.74 to 0.90) and 13 (95% CI: 4 to 41), respectively. However, there was significant heterogeneity between studies (I2 = 100.0%). In order to find the source of heterogeneity, meta-regression analysis was performed. Meta-regression showed that sim-ilar to REMS, the most important source of heterogeneity between studies in RAPS analysis was the use of different cut-off points, differences in study design (retrospective and prospective), type of study (cohort, case-control and crosssectional), and different patient settings (Table 2). Stratification of analyses based on these differences between studies caused a significant reduction in heterogeneity, to the point where I2 was equal to zero in some subgroups. Accordingly, the findings were reported separately for these subgroups. The first and most important factor influencing the prognostic value of RAPS in predicting the in-hospital mortality was the different cut-off points used between studies. In this section, RAPS cut-off points were divided into three groups: RAPS scores ≤3, RAPS score equal to 4, and RAPS scores of 7 to 8. The area under the ROC curve of the RAPS model at the cut-off points ≤3, 4, and 7 to 8 were equal to 0.93 (95% CI: 0.90 to 0.95), 0.81 (95% CI: 0.77 to 0.84), and 0.94 (0.91 to 0.96), respectively ( Figure 3-B to 3-D). In the study of DOR between subgroups, it was found that the classification of patients based on RAPS scores 7 to 8 had a higher clinical value than other cut-off points, since the DOR of RAPS was 69 times higher in predicting the in-hospital mortality, which is much higher than when cut-off points 4 (DOR = 9) and ≤3 (DOR = 42) are used (Table 4). In examining the role of difference in the type of study on the predictive value of RAPS, it was found that 8 of the 10 analyses were performed as cohort studies. Therefore, subgroup analysis was not very useful for this factor. In addition, the effect of difference in the design of study (retrospective versus prospective) on the predictive value of RAPS was not significant. As Table 4 shows, the DOR of RAPS was 17 in prospective studies and 13 in retrospective studies, both of which indicate a high clinical value for RAPS in outcome prediction. The setting of patients in the included studies was another factor that caused heterogeneity in the findings of the RAPS section. In this part, 4 analyses were performed on trauma patients, 2 analyses were performed on patients with sepsis/infection, 3 analyses were performed on non-trauma acute surgery, and 3 analyses were performed on all acute settings. High prognostic value of RAPS in trauma patients was noticeable. The DOR of RAPS in predicting the inhospital mortality of trauma patients was 431, while the value in sepsis/infection, non-trauma setting, and all acute settings was 6, 29, and 3, respectively (Table 4).

Diagnostic value of mREMS in predicting the in-hospital mortality
In the evaluation of the diagnostic value of mREMS in predicting the in-hospital mortality, 3 articles were included, which involved 13 separate analyses in terms of different cutoff points. The total number of patients in these 3 studies was 157749 patients, of which 12110 patients died. The cut-off points presented for this model in the studies varied between 3 and 14.
The area under the ROC curve of mREMS in predicting the in-hospital mortality without considering the cut-off points was 0.64 (95% CI: 0.60 to 0.68). The sensitivity, specificity, and overall DOR of the mREMS model were lower than those of the REMS and RAPS scores, and were 0.74 (95% CI 0.50 to 0.89), 0.46 (95% CI: 0.25 to 0.69) and 3 (95% CI: 2 to 4), respectively (Figure 4-A and Table 6). However, there was significant heterogeneity between studies (I2 = 100.0%); Therefore, meta-regression analysis was performed. Meta-regression showed that the most important source of heterogeneity between studies in mREMS analyses was the use of different cut-off points, differences in study design (retrospective and prospective), and different patient settings ( Table 2). Stratification of analyses based on these differences between studies caused a significant reduction in heterogeneity among studies to the point where I2 was zero in some subgroups. The findings were reported separately for these subgroups. The first and the most important factor influencing the prognostic value of mREMS in predicting the in-hospital mortality was the different cut-off points used between studies. In this section, due to the small number of studies, mREMS cutoff points were divided into two groups: mREMS scores <10 and mREMS scores≥10. The area under the ROC curve of the mREMS model at the cut-off points of <10 and ≥10 was 0.73 (95% CI: 0.69 to 0.77) and 0.62 (95% CI: 0.58 to 0.66), respectively ( Figure 4-B to 4-D). DOR was not significantly different between the two subgroups (3 versus 2, respectively) ( Table  5). All three studies included in this section were cohorts, so the type of study could not be the source of heterogeneity. Also, out of 13 analyses included in this section (from three studies), only 1 had a retrospective design analysis. Therefore, the difference in study design could not be the source of heterogeneity in examining the prognostic value of mREMS. Finally, it was found that 11 analyses of this section were performed on sepsis / infection patients, 1 analysis was performed on trauma patients, and 1 analysis was performed on all acute settings, which also showed that patients' settings could not be a source of heterogeneity ( Table 5).

The diagnostic value of WPS in predicting the in-hospital mortality
In the evaluation of the diagnostic value of WPS in predicting the in-hospital mortality, 5 articles were included, which involved 5 separate analyses. The total number of patients in these 5 studies was 10,771 patients, of whom 786 patients died. The cut-off points presented for this model in the studies varied between 3 and 6. The area under the ROC curve of WPS in predicting the in-hospital mortality without considering cut-off points was 0.86 (95% CI: 0.83 to 0.89). The sensitivity and specificity of this scoring model in predicting the in-hospital mortality were 0.76 (95% CI: 0.64 to 0.85) and 0.85 (95% CI: 0.71 to 0.92), respectively. Overall, the DOR of WPS was 17 (95% CI: 5 to 59) ( Figure 5). In this section, there was significant heterogeneity between studies (I2 = 89.9%). Although meta-regression was performed in this part of the analysis, due to the the small number of studies, the origin of heterogeneity could not be identified and it was not possible to perform subgroup analysis ( Table 2).

Publication bias
Deek's funnel plot asymmetry test was used to examine the publication bias. This analysis showed that there was no evidence of publication bias in the relationship between REMS (p = 0.58), RAPS (p = 0.13), mREMS (p = 0.36), and WPS (p = 0.22) with in-hospital mortality.

Risk of bias assessment
In the quality control of articles, it was found that 1 study had a high risk of bias in the patient selection section due to its case-control design. Additionally, the quality of the index test in 12 studies was unclear. The reason for this was the retrospective nature of the studies. In retrospective studies, physiological variables such as temperature, blood pressure, etc. are collected from patients' files; so, it is not clear how accurately these variables were recorded. Moreover, in the flow and timing section these 12 studies were high-risk, because data collection was done after the outcome (death) of patients was determined. In other items, the risk of bias and applicability were low (Table 6).

Discussion
Different studies have provided different cut-off points for the classification of patients at high risk of mortality in each of the physiological scoring systems of REMS, mREMS, RAPS, and WPS. Taking into account the uncertainty in the superiority of these physiological systems over each other in different patient settings, the present meta-analysis has, for the first time, collected the available evidence about the diagnostic value, sensitivity, and specificity of these physiological systems in acute patients and has tried to investigate the best cut-off point in each scoring system. The findings of the present study showed that RAPS, REMS and WPS, have a high predictive value for in-hospital mortality. A summary of the prognostic value of these models in identifying high-risk patients is presented in Table 7. Also, the value of these models in trauma patients is much higher than other patient settings. However, the evidence on the REMS model is greater than the other two models, and since the DOR of this model of identifying high-risk patients is very high, it is recommended to use REMS in emergency departments. The RAPS model has been proposed for many years and its predictive value has been proven in some studies in adults (34). Years after the introduction of this model, REMS was introduced to increase the value of RAPS, in which patients' age and arterial oxygen saturation level were added to the variables in RAPS. Adding these two variables to RAPS increased its validity and REMS was proposed in the literature as an efficient model to classify damage severity (35, 36). Nevertheless, the findings of the present study show that REMS model is not more valuable than RAPS. Therefore, more studies are needed in this field to determine how much adding age and level of arterial oxygen saturation enhances the performance of these models. Considering trauma patients as a separate group, it seems that using RAPS system is the best option to diagnose highrisk cases among these patients. This means that considering age and O2 sat, in addition to the RAPS components in the REMS criteria, lowers its diagnostic value. However, caution should be exercised in interpreting these results, because in our study, 2 studies (4 analyses) on the diagnostic value of RAPS in trauma patients were included, in one study, the sample population was children and the other was executed cross-sectionally. Moreover, in studies that defined the sample population as all acute patients, trauma patients were included, but it was not possible to separate these patients from the rest of the study population in our analyses; therefore, there is a possibility of a potential error that could not be eliminated due to the nature of this study as a systematic review, and further and more comprehensive studies are needed to investigate this issue. Various scoring systems have been proposed to classify the severity of injury. These systems include physiological, anatomical, and combined scores as well as specialized trauma scoring systems (37). Each of these systems have different limitations and advantages; but the scoring system that can be used in acute conditions should have few variables and be easily calculated. Almost all scoring systems have a Glasgow Coma Scale (GCS) awareness level. In addition to GCS, these scoring models use physiological criteria such as body temperature, respiration rate, blood pressure, and heart rate to determine the severity of the injury. Nonetheless, the question is whether adding these physiological criteria to GCS would sufficiently and significantly lead to a better and more accurate diagnosis of injury severity. To answer this question, an article was conducted on 1,702 patients, and showed that the predictive value of GCS is similar to physiological scoring models. The study concluded that GCS is the best model for predicting patient mortality as it is easier than physiological models to calculate and has fewer variables. Also, its predictive value is not significantly different from these models (38). Therefore, it is suggested that more studies be conducted to compare the value of physiological models with GCS. This study, like other retrospective studies, had its limitations. First, the quality of recording the clinical characteristics of patients in the emergency department could not be as-sessed. Also, the number of studies performed in each acute setting was different and limited. This study has had many strengths. The study population was a total of 737,351 patients, which is considerable; in addition, four physiological systems designed for evaluation of acute patients in emergency settings, including REMS, RAPS, mREMS and WPS, have been simultaneously studied in patients with trauma, sepsis, acute conditions, and nontraumatic acute conditions. It should be noted that most of the studies included were prospective cohorts.

Conclusion
The findings of the present study showed that RAPS, REMS and WPS, have a high predictive value in in-hospital mortality. Also, the value of these models in trauma patients is much higher than other patients. However, the number of articles on the REMS model is more than the other two models, and since the DOR of this model is high in identifying high-risk patients, it is recommended to use REMS in acute conditions to identify high-risk patients.

Conflict of Interest
The authors declared no conflict of interest.

Funding
This study was funded and supported by a grant from Iran university of medical sciences.  (6)